Expert-Based Software Measurement Data Analysis with Clustering Techniques
نویسندگان
چکیده
Software quality estimation models, used to predict the fault-proneness of software modules based on software metrics, are often constructed by training a classifier from labeled software metrics data. Two challenges often encountered in building an accurate model are the presence of “noisy” data and the possible unavailability of fault-proneness labels in real-world projects. The performance of a model often improves if outliers and noise are removed from the training data. More important, a classifier cannot be trained without fault-proneness labels. This article describes an exploratory analysis method that addresses these two challenges and that is built with clustering and the help of a software engineering expert. It is an unsupervised method since labeled training data are not required to predict the fault-proneness of software modules. We present two real-world case studies to verify the effectiveness of the clusteringand expert-based approach in predicting both the fault-proneness of software modules and potential “noisy” (e.g., mislabeled) modules. Keywords— Software Quality Estimation, Exploratory Data Analysis, Clustering, Noise Detection
منابع مشابه
Combination of Transformed-means Clustering and Neural Networks for Short-Term Solar Radiation Forecasting
In order to provide an efficient conversion and utilization of solar power, solar radiation datashould be measured continuously and accurately over the long-term period. However, the measurement ofsolar radiation is not available to all countries in the world due to some technical and fiscal limitations. Hence,several studies were proposed in the literature to find mathematical and physical mod...
متن کاملClassification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملAn Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering
The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...
متن کاملChoosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation
1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...
متن کامل